Investigation into bottle-neck features for meeting speech recognition
نویسندگان
چکیده
This work investigates into recently proposed Bottle-Neck features for ASR. The bottle-neck ANN structure is imported into Split Context architecture gaining significant WER reduction. Further, Universal Context architecture was developed which simplifies the system by using only one universal ANN for all temporal splits. Significant WER reduction can be obtained by applying fMPE on top of our BN features as a technique for discriminative feature extraction and further gain is also obtained by retraining model parameters using MPE criterion. The results are reported on meeting data from RT07 evaluation.
منابع مشابه
Investigating the learning effect of multilingual bottle-neck features for ASR
Deep neural networks (DNNs) have become state-of-the-art techniques of automatic speech recognition in the last few years. They can be used at the preprocessing level (Tandem or BottleNeck features) or at the acoustic model level (hybrid Hidden Markov Model/DNN). Moreover, they allow exploiting multilingual data to improve monolingual systems. This paper presents our investigation of the learni...
متن کامل(Deep) Neural Networks
This work continues in development of the recently proposed Bottle-Neck features for ASR. A five-layers MLP used in bottleneck feature extraction allows to obtai arbitrary feature size without dimensionality reduction by transforms, independently on the MLP training targets. The MLP topology – number and sizes of layers, suitable training targets, the impact of output feature transforms, the ne...
متن کاملHierarchical neural net architectures for feature extraction in ASR
This paper presents the use of neural net hierarchy for feature extraction in ASR. The recently proposed Bottle-Neck feature extraction is extended and used in hierarchical structures to enhance the discriminative property of the features. Although many ways of hierarchical classification/feature extraction have been proposed, we restricted ourselves to use the outputs of the first stage neural...
متن کاملClassification of emotional speech using spectral pattern features
Speech Emotion Recognition (SER) is a new and challenging research area with a wide range of applications in man-machine interactions. The aim of a SER system is to recognize human emotion by analyzing the acoustics of speech sound. In this study, we propose Spectral Pattern features (SPs) and Harmonic Energy features (HEs) for emotion recognition. These features extracted from the spectrogram ...
متن کاملBUT 2014 Babel system: analysis of adaptation in NN based systems
Features based on a hierarchy of neural networks with compressive layers – Stacked Bottle-Neck (SBN) features – were recently shown to provide excellent performance in LVCSR systems. This paper summarizes several techniques investigated in our work towards Babel 2014 evaluations: (1) using several versions of fundamental frequency (F0) estimates, (2) semi-supervised training on un-transcribed d...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2009